Certifying cost annotations in compilers* 



Roberto M. Amadio 1 - 1 - 1 Nicolas Ayache^ 
Yann Regis-Gianas^ Ronan Saillard^ 

W Universite Paris Diderot (UMR-CNRS 7126) 
^ 1 ( 2 ) Universite Paris Diderot (UMR-CNRS 7126) and INRIA (Team nr 2 ) 

Q ; October 11, 2010 

00 

Abstract 

^ ■ We discuss the problem of building a compiler which can lift in a provably correct 

\ way pieces of information on the execution cost of the object code to cost annotations 

on the source code. To this end, we need a clear and flexible picture of: (i) the meaning 
of cost annotations, (ii) the method to prove them sound and precise, and (iii) the way 
J> ■ such proofs can be composed. We propose a so-called labelling approach to these three 

t**"* \ questions. As a first step, we examine its application to a toy compiler. This formal study 

■ suggests that the labelling approach has good compositionality and scalability properties. 

In order to provide further evidence for this claim, we report our successful experience in 
implementing and testing the labelling approach on top of a prototype compiler written 
in oca ml for (a large fragment of) the C language. 

o 



1 Introduction 



- i — i , 

^ . The formal description and certification of software components is reaching a certain level of 

maturity with impressing case studies ranging from compilers to kernels of operating systems. 
A well-documented example is the proof of functional correctness of a moderately optimizing 
compiler from a large subset of the C language to a typical assembly language of the kind 
used in embedded systems [S|. 

In the framework of the Certified Complexity (CerCo) project [3], we aim to refine this line 
of work by focusing on the issue of the execution cost of the compiled code. Specifically, we aim 
to build a formally verified C compiler that given a source program produces automatically 
a functionally equivalent object code plus an annotation of the source code which is a sound 
and precise description of the execution cost of the object code. 

We target in particular the kind of C programs produced for embedded applications; these 
programs are eventually compiled to binaries executable on specific processors. The current 
state of the art in commercial products such as Scade [H [7] is that the reaction time of the 
program is estimated by means of abstract interpretation methods (such as those developed 
by Abslnt [HE]) that operate on the binaries. These methods rely on a specific knowledge 
of the architecture of the processor and may require explicit annotations of the binaries to 
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determine the number of times a loop is iterated (see, e.g., [15] for a survey of the state of 
the art). 

In this context, our aim is to produce a functionally correct compiler which can lift in a 
provably correct way the pieces of information on the execution cost of the binary code to cost 
annotations on the source C code. Eventually, we plan to manipulate the cost annotations 
with automatic tools such as Frama — C [5]. In order to carry on our project, we need a 
clear and flexible picture of: (i) the meaning of cost annotations, (ii) the method to prove 
them sound and precise, and (hi) the way such proofs can be composed. Our purpose here 
is to propose a methodology addressing these three questions and to consider its concrete 
application to a simple toy compiler and to a moderately optimizing untrusted C compiler. 

Meaning of cost annotations The execution cost of the source programs we are inter- 
ested in depends on their control structure. Typically, the source programs are composed of 
mutually recursive procedures and loops and their execution cost depends, up to some multi- 
plicative constant, on the number of times procedure calls and loop iterations are performed. 
Producing a cost annotation of a source program amounts to: 

• enrich the program with a collection of global cost variables to measure resource con- 
sumption (time, stack size, heap size,. . .) 

• inject suitable code at some critical points (procedures, loops,. . .) to keep track of the 
execution cost. 

Thus producing a cost-annotation of a source program P amounts to build an annotated 
program An{P) which behaves as P while self-monitoring its execution cost. In particular, 
if we do not observe the cost variables then we expect the annotated program An(P) to be 
functionally equivalent to P. Notice that in the proposed approach an annotated program is 
a program in the source language. Therefore the meaning of the cost annotations is automat- 
ically defined by the semantics of the source language and tools developed to reason on the 
source programs can be directly applied to the annotated programs too. 

Soundness and precision of cost annotations Suppose we have a functionally correct 
compiler C that associates with a program P in the source language a program C{P) in the 
object language. Further suppose we have some obvious way of defining the execution cost 
of an object code. For instance, we have a good estimate of the number of cycles needed 
for the execution of each instruction of the object code. Now the annotation of the source 
program An{P) is sound if its prediction of the execution cost is an upper bound for the 
'real' execution cost. Moreover, we say that the annotation is precise with respect to the 
cost model if the difference between the predicted and real execution costs is bounded by a 
constant which depends on the program. 

Compositionality In order to master the complexity of the compilation process (and its 
verification), the compilation function C must be regarded as the result of the composition 
of a certain number of program transformations C = o ■ ■ ■ o C± . When building a system 
of cost annotations on top of an existing compiler a certain number of problems arise. First, 
the estimated cost of executing a piece of source code is determined only at the end of 
the compilation process. Thus while we are used to define the compilation functions Cj in 
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increasing order (from left to right), the annotation function An is the result of a progressive 
abstraction from the object to the source code (from right to left). Second, we must be able to 
foresee in the source language the looping and branching points of the object code. Missing a 
loop may lead to unsound cost annotations while missing a branching point may lead to rough 
cost predictions. This means that we must have a rather good idea of the way the source 
code will eventually be compiled to object code. Third, the definition of the annotation of the 
source code depends heavily on contextual information. For instance, the cost of the compiled 
code associated with a simple expression such as x + 1 will depend on the place in the memory 
hierarchy where the variable x is allocated. A previous experience described in [2] suggests 
that the process of pushing 'hidden parameters' in the definitions of cost annotations and of 
manipulating directly numerical cost is error prone and produces complex proofs. For this 
reason, we advocate next a 'labelling approach' where costs are handled at an abstract level 
and numerical values are produced at the very end of the construction. 

Labelling approach to cost annotations The 'labelling' approach to the problem of 
building cost annotations is summarized in the following diagram. 




Lk+i,i 

er k+1 
Lk+1 



er i+1 o d 
eri o £ 
An 



= Ci o eri 
= id Li 
= To C 



For each language Li considered in the compilation process, we define an extended labelled 
language L^£ and an extended operational semantics. The labels are used to mark certain 
points of the control. The semantics makes sure that whenever we cross a labelled control 
point a labelled and observable transition is produced. 

For each labelled language there is an obvious function eri erasing all labels and produc- 
ing a program in the corresponding unlabelled language. The compilation functions Ci are 
extended from the unlabelled to the labelled language so that they enjoy commutation with 
the erasure functions. Moreover, we lift the soundness properties of the compilation functions 
from the unlabelled to the labelled languages and transition systems. 

A labelling C of the source language L\ is just a function such that er^ l o £ is the identity 
function. An instrumentation I of the source labelled language L\ t is a function replacing 
the labels with suitable increments of, say, a fresh global cost variable. Then an annotation 
An of the source program can be derived simply as the composition of the labelling and the 
instrumentation functions: An = X o C. 

Suppose s is some adequate representation of the state of a program. Let P be a source 
program and suppose that its annotation satisfies the following property: 



(An(P), s[c/cost]) JJ- s'[c + 5/ cost] 



(1) 



where c and 5 are some non-negative numbers. Then the definition of the instrumentation 
and the fact that the soundness proofs of the compilation functions have been lifted to the 
labelled languages allows to conclude that 



{C{C{P)),s[c/cost}) $ (s'[c/cost],X) 



(2) 
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where C = Ck ° • • • ° C\ and A is a sequence (or a multi-set) of labels whose 'cost' corresponds 
to the number 5 produced by the annotated program. Then the commutation properties of 
erasure and compilation functions allows to conclude that the erasure of the compiled labelled 
code erk+i(C(C(P))) is actually equal to the compiled code C(P) we are interested in. Given 
this, the following question arises: under which conditions the sequence A, i.e., the increment 
6, is a sound and possibly precise description of the execution cost of the object code? 

To answer this question, we observe that the object code we are interested in is some kind 
of assembly code and its control flow can be easily represented as a control flow graph. The 
fact that we have to prove the soundness of the compilation functions means that we have 
plenty of information on the way the control flows in the compiled code, in particular as far 
as procedure calls and returns are concerned. These pieces of information allow to build a 
rather accurate representation of the control flow of the compiled code at run time. 

The idea is then to perform two simple checks on the control flow graph. The first check is 
to verify that all loops go through a labelled node. If this is the case then we can associate a 
finite cost with every label and prove that the cost annotations are sound. The second check 
amounts to verify that all paths starting from a label have the same cost. If this check is 
successful then we can conclude that the cost annotations are precise. 

A toy compiler As a first case study for the labelling approach to cost annotations we 
have sketched, we introduce a toy compiler which is summarised by the following diagram. 

C C' 
Imp >■ Vm Mips 

The three languages considered can be shortly described as follows: Imp is a very sim- 
ple imperative language with pure expressions, branching and looping commands, Vm is an 
assembly- like language enriched with a stack, and Mips is a Mips- like assembly language with 
registers and main memory. The first compilation function C relies on the stack of the Vm 
language to implement expression evaluation while the second compilation function C allo- 
cates (statically) the base of the stack in the registers and the rest in main memory. This is 
of course a naive strategy but it suffices to expose some of the problems that arise in defining 
a compositional approach. 

A C compiler As a second, more complex, case study we consider a C compiler we have 
built in oca ml whose structure is summarised by the following diagram: 

C — » Clight -> Cminor — > RTLAbs (front end) 

I 

Mips <- LIN <- LTL 4r- ERTL <- RTL (back-end) 

The structure follows rather closely the one of the CompCert compiler [9]. Notable dif- 
ferences are that some compilation steps are fusioned, that the front-end goes till RTLAbs 
(rather than Cminor) and that we target the Mips assembly language (rather than PowerPc). 
These differences are contingent to the way we built the compiler. The compilation from C 
to Clight relies on the CIL front-end [13 ]. The one from Clight to RTL has been programmed 
from scratch and it is partly based on the Coq definitions available in the CompCert com- 
piler. Finally, the back-end from RTL to Mips is based on a compiler developed in ocaml for 
pedagogical purposes [14]. The main optimisations it performs are common subexpression 
elimination, liveness analysis and register allocation, and graph compression. We ran some 
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benchmarks to ensure that our prototype implementation is realistic. The results are given 
in appendix IB. 91 and the compiler is available from the authors. 

Organisation The rest of the paper is organised as follows. Section [2] describes the 3 
languages and the 2 compilation steps of the toy compiler. Section describes the application 
of the labelling approach to the toy compiler. SectionU]reports our experience in implementing 
and testing the labelling approach on the C compiler. Section [S] summarizes our contribution 
and outlines some perspectives for future work. Appendix [A] sketches the proofs that have 
not been mechanically checked in Coq and appendix [B] provides some details on the structure 
of the C compiler we have implemented. 

2 A toy compiler 

We formalise the toy compiler introduced in section [TJ 
2.1 Imp: language and semantics 

The syntax of the Imp language is described below. This is a rather standard imperative 
language with while loops and if-then-else. 



id : 


:= x | y | . . . 


(identifiers) 


n : 


:=0|-1|+1|... 


(integers) 


v : 


:= n | true | false 


(values) 


e : 


:= id | n | e + e 


(numerical expressions) 


b : 


:= e < e 


(boolean conditions) 


S : 


:= skip id := e\S;S if b then 5" else S | while b do S 


(commands) 


P : 


:= prog S 


(programs) 



Let s be a total function from identifiers to integers representing the state. If s is a state, 
x an identifier, and n an integer then s[n/a;] is the 'updated' state such that s[n/x](x) = n 
and s[n/x](y) = s(y) if x ^ y. The big-step operational semantics of Imp expressions and 
boolean conditions is defined as follows: 

(e,s)ij.v (eVsjJjV {e,s)ij.v (e^sjJjV 

{v,s)i).v {x,s)i).s(x) (e + e',s) ij. (v+ z v') (e < e', s) ij. (v <z v') 

A continuation K is a list of commands which terminates with a special symbol halt: 
K ::= halt | S ■ K. Table [T] defines a small-step semantics of Imp commands whose basic 
judgement has the shape: (S,K,s) — > (S',K',s'). We define the semantics of a program 
prog S as the semantics of the command S with continuation halt. We derive a big step 
semantics from the small step one as follows: (S, s) JJ- s' if (S, halt, s) —>■•■• — > (skip, halt, s'). 

2.2 Vm: language and semantics 

Following [10], we define a virtual machine Vm and its programming language. The machine 
includes the following elements: (1) a fixed code C (a possibly empty sequence of instructions), 
(2) a program counter pc, (3) a store s (as for the source program), (4) a stack of integers a. 

Given a sequence C, we denote with \C\ its length and with C[i] its i th element (the 
leftmost element being the th element). The operational semantics of the instructions is 
formalised by rules of the shape C h (i,a,s) —¥ (j,o-',s') and it is fully described in table 
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(a; := e, K, s) 



—¥ (skip, K, s[v/x]) if (e, s) JJ. w 



(5; 



(S,S'-K,s) 



(if 6 then 5* else S',K,s) 

(while & do 5, ft» 
(skip, S • K, s) 



(S,K,s) if (6, s) J), true 
(S',A» if (6, s)J> false 

(S, (while & do S) • if, s) if (6, s) ^ true 
(skip, A', s) if (b, s) 4J- false 



(S,K,s) 



Table 1: Small-step operational semantics of Imp commands 



Rule 


CW = 


C\-(i 


a. 


s) 


— ¥ (i + 1, n • a, s) 


cnst(n) 


C h (i 


a. 


s) 


— > (i + 1, s(x) ■ a, s) 


var(x') 


CI- (i 


n 


■ a. 


s) — > (i + 1, cr, s[n/x]) 


setvar(x) 


Ch (i 


n 


■ n' 


■ a, s) — > (i + 1, (n + z ti') ■ cr, s) 


add 


C\-(i 


(J, 


s) 


(i + k + l,a,s) 


branch(k) 


Ch (i 


n 


■ n' 


■ cr, s) — > (i + 1, cr, s) 


bge(k) and n <z n' 


Ch{i 


n 


■ n' 


• cr, s) — ¥ (i + k + 1, cr, s) 


bge(k) and n > z n' 



Table 2: Operational semantics Vm programs 

El Notice that Imp and Vm semantics share the same notion of store. We write, e.g., n ■ a 
to stress that the top element of the stack exists and is n. We will also write (C, s) J| s' if 
C h (0, e, s) 4 (i, e, s') and C[i] = halt. 

Code coming from the compilation of Imp programs has specific properties that are used 
in the following compilation step when values on the stack are allocated either in registers or 
in main memory. In particular, it turns out that for every instruction of the compiled code it 
is possible to predict statically the height of the stack whenever the instruction is executed. 
We now proceed to define a simple notion of well-formed code and show that it enjoys this 
property. In the following section, we will define the compilation function from Imp to Vm 
and show that it produces well-formed code. 

Definition 1 We say that a sequence of instructions C is well formed if there is a function 
h : {0, . . . , \C\} — > N which satisfies the conditions listed in tabled for < i < \C\ — 1. In 
this case we write C : h. 

The conditions defining the predicate C : h are strong enough to entail that h correctly 



C[{\ = 


Conditions for C : h 


cnst(n) or var(ir) 


h{i + 1) = h{i) + 1 




add 


h(i)>2, h{i + l)-- 


= h(i) - 1 


setvar(i) 


h{i) = l, h(i + l) = 


= 


branch(fc) 


< i + k + 1 < \C\, 


h(i) =h(i + l) = h(i + k + l) = 


bge(fc) 


< i + k + 1 < \C\, 


h(i)=2, h(i + l) = h(i + k + l)=0 


halt 


i = |C|-l, h{i) = 


h{i + l) =0 



Table 3: Conditions for well-formed code 
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CO) = var(a;) C(n) = cast (ri) C(e + e') = C(e) • C(e') • add 

C(e < e', fc) = C(e') ■ C(e) • bge(k) 

C(x := e) = C(e) ■ setvar(x) C(S; S") = C(S) ■ C(S') 

C(if b then 5" else S") = C(b, k) ■ C(S) ■ (branch(fc')) ■ C(S') 
where: k = sz(S) + 1, fc' = sz(S') 

C(while b do S) = C(b,k) ■ C{S) ■ branch(fc') 
where: k = sz(S) + 1, fc' = -(sz(b) + 82 (5) + 1) 

C(prog 5) = C(S) ■ halt 
Table 4: Compilation from Imp to Vm 

predicts the stack height and to guarantee the uniqueness of h up to the initial condition. 

Proposition 2 (1) If C : h, C > (i,a,s) A (j,a',s') } and h(i) = \a\ then h(j) = \a'\. (2) If 
C :h, C :ti and h(0) = h'(0) then h = h! . 

2.3 Compilation from Imp to Vm 

In table HI we define compilation functions C from Imp to Vm which operate on expressions, 
boolean conditions, statements, and programs. We write sz(e), sz(b), sz(S) for the number of 
instructions the compilation function associates with the expression e, the boolean condition 
b, and the statement S, respectively. 

We follow |10] for the proof of soundness of the compilation function for expressions and 
boolean conditions (see also for a much older reference). 

Proposition 3 The following properties hold: 

(1) If (e, s) JJ- v then C ■ C(e) ■ C h (i, a, s) A (j, v ■ a, s) where i = \C\ and j = \C ■ C(e)\. 

(2) If(b, sH true thenC-C(b,k)-C h (i,a,s) A (j+k,a,s) where i = |C| and j = \C-C{b,k)\. 

(3) If(b,s) ^ false then C ■ C(b, k) ■ C h (i,a,s) A (j,a,s) where i = \C\ and j = \C-C(b,k)\. 

Next we focus on the compilation of statements. We introduce a ternary relation R(C, i, K) 
which relates a Vm code C, a number i € {0, . . . , \C \ — 1} and a continuation K. The intuition 
is that relative to the code C, the instruction i can be regarded as having continuation K. 
(A formal definition is available in appendix [H) We can then state the correctness of the 
compilation function as follows. 

Proposition 4 // (S, K, s) -> (S' , K', s') and R(C, i,S-K) then C h (i, a, s) A {j, a, s') and 
R(C,j,S>-K>). 

As announced, we can prove that the result of the compilation is a well-formed code. 
Proposition 5 For any program P there is a unique h such that C(P) : h. 
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Rule 


M[i] = 


Mh(i 


m) - 


■> (i + 1, m[n/R]) 


loadi R, n 




Mh(i 


m) - 


> (i + l,m[m(l)/R]) 


load R, I 




Mh(i 


m) - 


l,m[m(R)/l])) 


store _R, / 




Mh(i 


m) - 


* (i + 1, m[m(i?') + m(-R")/-R]) 


add R,R',R" 




Mh(i 


m) - 


-> (i + ft + 1, m) 


branch k 




Mh(i 


m) - 


■¥ (i + 1, m) 


bge R, R' , k and m(R) <z 


m(R') 


Mh(i 


m) - 


» (i + k + 1, m) 


bge i?, J?', k and m(i?) >z 


m(R') 



Table 5: Operational semantics Mips programs 



2.4 Mips: language and semantics 

We consider a Mips-like machine [8] which includes the following elements: (1) a fixed code 
M (a sequence of instructions), (2) a program counter pc, (3) a finite set of registers including 
the registers A, B, and Rq, . . . , Rb-i, and (4) an (infinite) main memory which maps locations 
to integers. 

We denote with R,R',... registers, with 1,1',... locations and with m,m' , . . . memories 
which are total functions from registers and locations to (unbounded) integers. We denote 
with M a list of instructions. The operational semantics is formalised in table [5] by rules of the 
shape M h (i,m) (J, m'), where M is a list of Mips instructions, i,j are natural numbers 
and m,m' are memories. We write (M,m) J| m' if M h (0, m) A (j,m') and M[j] = halt. 

2.5 Compilation from Vm to Mips 

In order to compile Vm programs to Mips programs we make the following hypotheses: (1) for 
every Vm program variable x we reserve an address l x , (2) for every natural number h > b, 
we reserve an address 1^ (the addresses l X) lhi ■ ■ ■ are an distinct), and (3) we store the first 
b elements of the stack a in the registers Rq, . . . ,Rb-i and the remaining (if any) at the 
addresses lb, h+i, 

We say that the memory m represents the stack a and the store s, and write m \\—a,s, if the 
following conditions are satisfied: (1) s(x) = m(l x ), and (2) if < i < \a\ then a[i] = m(Ri) 
if i < b, and a[i] = m{li) if i > b. 

The compilation function C from Vm to Mips is described in table [6l It operates on a 
well-formed Vm code C whose last instruction is halt. Hence, by proposition E]^3), there is a 
unique h such that C : h. We denote with C'(C) the concatenation C'(0, C) ■ ■ ■ C'(\C\ — 1, C). 
Given a well formed Vm code C with i < \C\ we denote with p(i, C) the position of the first 
instruction in C'(C) which corresponds to the compilation of the instruction with position % 
in C. This is defined ad3p(^,C) = ^o<j<id{i,C), where the function d{i,C) is defined as 
d(i,C) = \C'(i,C)\. Hence d(i,C) is the number of Mips instructions associated with the 
i th instruction of the (well-formed) C code. The functional correctness of the compilation 
function can then be stated as follows. 

Proposition 6 Let C : h be a well formed code. If C h (i,o~,s) —¥ (j,a',s') with h(i) = \a\ 
and m\\—a, s then C'(C) h (p(i,C),m) A (p(j,C),m') and m' \\—o~' ,s' . 

1 There is an obvious circularity in this definition that can be easily eliminated by defining first the function 
d following the case analysis in table [6l then the function p, and finally the function C' as in table [6] 
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C\i] = 
cnst(n) 



add 

setvar(x-) 
branch(fc) 

bge(fc) 

halt 



C'(i,C) 



(loadi Rh,n) 

(loadi A, n) ■ (store A,lh) 
(load R h , l x ) 

(load A,l x ) ■ (store A,lh) 
(add Rh-2, Rh-2, Rh-i) 
(load A,l h -i) ■ (add i? h _; 



1 

(load A,l h -t) ■ (load B,l h . 
(add A, B, A) ■ (store A, l h 
(store R h -i lx) 
(load A^j-i) ■ (store A, l x 
(branch k') if fc' = p{i + k + 1, C) 
(bge Rh-2, Rh-i,k') 
(load A,l h -t) ■ (bge i? h - 2 ,^,fc') 
(load A,l h - 2 ) ■ (bad B, l h _ x ) ■ (bge A,S,A;' 



halt 



if h = /i(i) < & 
otherwise 
if ft = < & 
otherwise 

if h = h(i) < (b- 
R h -2,A) if h = h(i) = (b ■ 
_ 2 ) if /i = h(i) > (b ■ 

-2) 

if h — h(i) < b 
) if h = h(i) > b 
p(i + l,C) 



1) 
1) 
1) 



if h = h(i) <(b-l) 
if /i = /i(i) = (6-1) 
ifft = ft(i) > (6-1), k' = 
p(i + k + l,C)-p(i + l,C) 



Table 6: Compilation from Vm to Mips 



3 Labelling approach for the toy compiler 

We apply the labelling approach introduced in section [1] to the toy compiler which results in 
the following diagram. 



Imp 

x 

c c' 
lmp £ s~ Vm f s- Mips £ 



c 



Imp 



Vm Mips 



er Vm o C 
erMips ° C 



eri n 
An 



I m p 



C o eri m p 
C o er Vm 



3.1 Labelled Imp 

We extend the syntax so that statements can be labelled: S ::=... \ £ : S. For instance, 
I : (while [n < x) do £ : 5) is a labelled command. The small step semantics of statements 
defined in table Q] is extended as follows. 



S,K,s) 



(S,K,s) 



We denote with A, A', . • ■ finite sequences of labels. In particular, we denote with e the 
empty sequence and identify an unlabelled transition with a transition labelled with e. Then 
the small step reduction relation we have defined on statements becomes a labelled transition 
system. There is an obvious erasure function er\ mp from the labelled language to the unlabelled 
one which is the identity on expressions and boolean conditions, and traverses commands 
removing all labels. We derive a labelled big-step semantics as follows: (S,s) -IJ- (s',X) if 

Ai A„ 



(S, halt, s) 



(skip, halt, s') and A = Ai • • • A n . 
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3.2 Labelled Vm 

We introduce a new instruction nop(£) whose semantics is denned as follows: 

C V (i, a, s) 4 (i + 1, a, s) if C[i] = nop(£) . 

The erasure function ery m amounts to remove from a Vm code C all the nop(£) instructions 
and recompute jumps accordingly. Specifically, let n(C, i, j) be the number of nop instructions 
in the interval Then, assuming C[i] = branch(/c) we replace the offset k with an offset 

k' determined as follows: 

, _ f k-n(C,i,i + k) if k > 
\ k + n(C,i + l + k,i) if k < 

The compilation function C is extended to Imp^ by defining: 

C{l:b,k) = (nop(£)) -C(b,k) C{i:S) = (nop (£))■ C(S) . 



Proposition 7 For all commands S in \mp e we have that: 

(1) ery m (C(S)) =C(er lmp (S)). 

(2) If(S,s)ll(s',\) then(C(S),s)^(s',X). 

Remark 8 In the current formulation, a sequence of transitions A in the source code must 
be simulated by the same sequence of transitions in the object code. However, in the actual 
computation of the costs, the order of the labels occurring in the sequence is immaterial. 
Therefore one may consider a more relaxed notion of simulation where X is a multi-set of 
labels. 

3.3 Labelled Mips 

The labelled extension of Mips is similar to the one of Vm. We add an instruction nop I whose 
semantics is defined as follows: 

Mh(i,m)4(i + l,m) if M[i] = (nop I) . 

The erasure function erMips is also similar to the one of Vm as it amounts to remove from 
a Mips code all the (nop £) instructions and recompute jumps accordingly. The compilation 
function C is extended to Vm; by simply translating nop(£) as (nop £): 

C'(i, C) = (nop I) if C[i] = nop(^) 

The evaluation predicate for labelled Mips is defined as (M,m) JJ. (m',A) if M h (0, m) — ^ 

• • • ^> (j, m'), A = Ai • • • \ n and M[j] = halt. The following proposition relates Vm^ code 
and its compilation and it is similar to proposition [71 

Proposition 9 Let C be a Vm^ code. Then: 

(1) er mps (C'(C))=C / (ery m (C)). 

(2) // (C, s) ^ (s', A) and m ||-e, s then (C'{C),m) $ (m' ( A) and w! ||-e, a'. 
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3.4 Labellings and instrumentations 

Assuming a function k which associates an integer number with labels and a distinct variable 
cost which does not occur in the program P under consideration, we abbreviate with inc(£) 
the assignment cost := cost + k(£). Then we define the instrumentation I (relative to k and 
cost) as follows: 

1(1 : S) = inc(£);l(S) . 

The function I just distributes over the other operators of the language. We extend the 
function k on labels to sequences of labels by defining k(£i, ■ ■ ■ , £ n ) = k(£i) + ■ ■ ■ + n(£ n ). The 
instrumented Imp program relates to the labelled one has follows. 

Proposition 10 Let S be an Imp^ command. If (X(S), s[c/ cost]) JJ. s'[c + 5/ cost] then 
3 A k(X) = 5 and (S, s[c/ cost]) ij. (s'[c/ cost], X). 

Definition 11 A labelling is a function C from an unlabelled language to the corresponding 
labelled one such that er\ mp o C is the identity function on the Imp language. 

Proposition 12 For any labelling function C, and Imp program P, the following holds: 

er mp5 (C'(C(£(P))) = C'(C(P)) . (3) 

Proposition 13 Given a function k for the labels and a labelling function C, for all programs 
P of the source language if (I(£(P)), s[c/ cost]) JJ. s'[c + 5/cost] and m ||— e, s[c/ cost] then 
(C'{C(£(P))),m) JJ. (m',X), m' \\-e, s'[c/ cost] and «(A) = 6. 

3.5 Sound and precise labellings 

With any Mips^ code M we can associate a directed and rooted (control flow) graph whose 
nodes are the instruction positions {0, . . . , \M\ — 1}, whose root is the node 0, and whose 
directed edges correspond to the possible transitions between instructions. We say that a 
node is labelled if it corresponds to an instruction nop £. 

Definition 14 A simple path in a Mips^ code M is a directed finite path in the graph as- 
sociated with M where the first node is labelled, the last node is the predecessor of either a 
labelled node or a leaf, and all the other nodes are unlabelled. 

Definition 15 A Mips^ code M is soundly labelled if in the associated graph the root node 
is labelled and there are no loops that do not go through a labelled node. 

In a soundly labelled graph there are finitely many simple paths. Thus, given a soundly 
labelled Mips code M, we can associate with every label £ a number k(£) which is the maximum 
(estimated) cost of executing a simple path whose first node is labelled with £. We stress that 
in the following we assume that the cost of a simple path is proportional to the number of 
Mips instructions that are crossed in the path. 

Proposition 16 If M is soundly labelled and (M, m) JJ- (m 1 , X) then the cost of the compu- 
tation is bounded by k(X). 

Thus for a soundly labelled Mips code the sequence of labels associated with a computation 
is a significant information on the execution cost. 
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£ s (progS) = prog£:£ s (S) 

£ s (skip) = skip 

Cs(x := e) = x := e 

C S (S;S') = C a (S);C a (S') 

£ s (if b then Si else S 2 ) = if & then £ s (Si) else £ S (S 2 ) 

C s (while & do S) = while 6 do I : C S (S) 

£ p (prog S) = prog C P (S) 

C P (S) =let£= new, (S',d) = £' P (S) m £ : S' 

C' P (S) =(S,0) if S = skip or 5 = (a: := e) 

£' p (if & then Si else S 2 ) = (if b then £ p (Si) else £ P (S 2 ), 1) 
£' p (while b do S) = (while 6 do £ P (S), 1) 

C p (Si;S2) = let (Si,di) = Cp(Si), (S 2 ,d 2 ) = C' P (S 2 ) in 

case d\ 

0: (Si;S' 2 ,d 2 ) 

1 : let £ = new; in (Si; ^ : S 2 ,d 2 ) 



Table 7: Two labellings for the Imp language 

Definition 17 PFe say that a soundly labelled code is precise if for every label £ in the code, 
the simple paths starting from a node labelled with t have the same cost. 

In particular, a code is precise if we can associate at most one simple path with every 
label. 

Proposition 18 If M is precisely labelled and (M,m) JJ- (m', A) then the cost of the compu- 
tation is k(A). 

The next point we have to check is that there are labelling functions (of the source code) 
such that the compilation function does produce sound and possibly precise labelled Mips 
code. To discuss this point, we introduce in table [7] two labelling functions C s and C p for 
the Imp language. The first labelling relies on just one label while the second one relies on a 
function "new" which is meant to return fresh labels and on an auxiliary function C' p which 
returns a labelled command and a binary directive d € {0, 1}. If d = 1 then the command 
that follows (if any) must be labelled. 

Proposition 19 For all Imp programs P: 

(1) C'(C(C S (P)) is a soundly labelled Mips code. 

(2) C'(C(Cp(P)) is a soundly and precisely labelled Mips code. 

For an example of command which is not soundly labelled, consider £:whileO<xdox:= 
x + 1, which when compiled, produces a loop that does not go through any label. On the 
other hand, for an example of a program which is not precisely labelled consider I : (if < 
x then x := x + 1 else skip). In the compiled code, we find two simple paths associated with 
the label I whose cost will be quite different in general. 

Once a sound and possibly precise labelling C has been designed, we can determine the 
cost of each label and define an instrumentation X whose composition with C will produce 
the desired cost annotation. 



12 



1. Label the input Clight program. 

2. Compile the labelled Clight program in the labelled world. This produces a labelled Mips code. 

3. For each label of the labelled Mips code, compute the cost of the instructions under its scope and generate 
a label-cost mapping. An unlabelled Mips code — the result of the compilation — is obtained by removing the 
labels from the labelled Mips code. 

4. Add a fresh cost variable to the labelled Clight program and replace the labels by an increment of this cost 
variable according to the label-cost mapping. The result is an annotated Clight program with no label. 

Table 8: Building the annotation of a Clight program in the labelling approach 

Definition 20 Given a labelling function C for the source language Imp and a program P in 
the Imp language, we define an annotation for the source program as follows: 

An imp (P)=l(C(P)) . 

Proposition 21 If P is a program and C'(C(C(P))) is a sound (sound and precise) labelling 
then (An\ mp (P), s[c/ cost]) JJ, s'[c+5/ cost] and m ||— e, s[c/ cost] entails that (C'(C(P)), m) 4 m', 
m' ||— e, s'[c/ cost] and the cost of the execution is bound (is exactly) 5. 

To summarise, producing sound and precise labellings is mainly a matter of designing 
the labelled source language so that the labelling is sufficiently fine grained. For instance, 
in the toy compiler, it enough to label commands while it is not necessary to label boolean 
conditions and expressions. 

Besides soundness and precision, a third criteria to evaluate labellings is that they do 
not introduce too many unnecessary labels. We call this property economy. There are two 
reasons for this requirement. On one hand we would like to minimise the number of labels 
so that the source program is not cluttered by too many cost annotations and on the other 
hand we would like to maximise the length of the simple paths because in a modern processor 
the longer the sequence of instructions we consider the more accurate is the estimation of 
their execution cost (on a long sequence certain costs are amortized). In practice, it seems 
that one can produce first a sound and possibly precise labelling and then apply heuristics to 
eliminate unnecessary labels. 

4 Labelling approach for the C compiler 

This section informally describes the labelled extensions of the languages in the compilation 
chain (see appendix [B] for details), the way the labels are propagated by the compilation 
functions, the labelling of the source code, the hypotheses on the control flow of the labelled 
Mips code and the verification that we perform on it, the way we build the instrumentation, 
and finally the way the labelling approach has been tested. The process of annotating a Clight 
program using the labelling approach is summarized in table [8] and is detailed in the following 
sections. 
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4.1 Labelled languages 

Both the Clight and Cminor languages are extended in the same way by labelling both state- 
ments and expressions (by comparison, in the toy language Imp we just labelled statements). 
The labelling of expressions aims to capture precisely their execution cost. Indeed, Clight and 
Cminor include expressions such as ai?02;a3 whose evaluation cost depends on the boolean 
value a\. As both languages are extended in the same way, the extended compilation does 
nothing more than sending Clight labelled statements and expressions to those of Cminor. 

The labelled versions of RTLAbs and the languages in the back-end simply consist in 
adding a new instruction whose semantics is to emit a label without modifying the state. For 
the CFG based languages (RTLAbs to LTL), this new instruction is emit label — > node. For 
LIN and Mips, it is emit label. The translation of these label instructions is immediate. In 
Mips, we also rely on a reserved label begin_f unction to pinpoint the beginning of a function 
code (cf. section FOj) . 

4.2 Labelling of the source language 

As for the toy compiler (cf. end of section [3|), the goals of a labelling are soundness, precision, 
and possibly economy. We explain our labelling by considering the constructions of Clight 
and their compilation to Mips. 

Sequential instructions A sequence of Clight instructions that compile to sequential Mips 
code, such as a sequence of assignments, can be handled by a single label which covers the 
unique execution path. 

Ternary expressions and conditionals Most Clight expressions compile to sequential 
Mips code. Ternary expressions, that introduce a branching in the control flow, are one 
exception. In this case, we achieve precision by associating a label with each branch. This 
is similar to the treatment of the conditional we have already discussed in section [3l As for 
the Clight operations && and I I which have a lazy semantics, they are transformed to ternary 
expressions before computing the labelling. 

Loops Loops in Clight are guarded by a condition. Following the arguments for the previous 
cases, we add two labels when encountering a loop construct: one label to start the loop's 
body, and one label when exiting the loop. This is similar to the treatment of while loops 
discussed in section [3] and it is enough to guarantee that the loop in the compiled code goes 
through a label. 

Program Labels and Gotos In Clight, program labels and gotos are intraprocedural. 
Their only effect on the control flow of the resulting assembly code is to potentially introduce 
an unguarded loop. This loop must contain at least one cost label in order to satisfy the 
soundness condition, which we ensure by adding a cost label right after a program label. 
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— , Labelling T , , , O ompilation T n „ -, « .. 

Chght H Labelled Chght > Labelled Mips 

lbl : lbl : lbl : 

i++; _cost: emit _cost 

i++; li $vO, 1 

goto lbl; . . . add $a0, $a0, $v0 

goto lbl ; ... 

i lbl 



Function calls Function calls in Mips are performed by indirect jumps, the address of the 
callee being in a register. In the general case, this address cannot be inferred statically. Even 
though the destination point of a function call is unknown, when the considered Mips code 
has been produced by our compiler, we know for a fact that this function ends with a return 
statement that transfers the control back to the instruction following the function call in 
the caller. As a result, we treat function calls according to the following global invariants 
of the compilation: (1) the instructions of a function are covered by the labels inside this 
function, (2) we assume a function call always returns and runs the instruction following the 
call. Invariant (1) entails in particular that each function must contain at least one label. To 
ensure this, we simply add a starting label in every function definition. The example below 
illustrates this point: 



Clight 



Labelling 



Labelled Clight 



C ampliation 



Labelled Mips 



void f () { 

f's 

} 



void f () { 
_cost: 

f 's body 

} 



f _start : 

Frame Creation 
Initializations 
emit _cost 
f 's body 
Frame Deletion 
return 



We notice that some instructions in Mips will be inserted before the first label is emit- 
ted. These instructions relate to the frame creation and/or variable initializations, and are 
composed of sequential instructions (no branching). To deal with this issue, we take the 
convention that the instructions that precede the first label in a function code are actually 
under the scope of the first label. Invariant (2) is of course an over-approximation of the 
program behaviour as a function might fail to return because of an infinite loop. In this case, 
the proposed labelling remains correct: it just assumes that the instructions following the 
function call will be executed, and takes their cost into consideration. The final computed 
cost is still an over- approximation of the actual cost. 



4.3 Verifications on the object code 

The labelling previously described has been designed so that the compiled Mips code satisfies 
the soundness and precision conditions. However, we do not need to prove this, instead we 
have to devise an algorithm that checks the conditions on the compiled code. The algorithm 
assumes a correct management of function calls in the compiled code. In particular, when 
we call a function we always jump to the first instruction of the corresponding code segment 
and when we return we always jump to an an instruction that follows a call. We stress that 
this is a reasonable hypothesis that is essentially subsumed by the proof that the object code 
simulates the source code. 
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In our current implementation, we check the soundness and the precision conditions while 
building at the same time the label-cost mapping. To this end, the algorithm takes the 
following main steps. 

• First, for each function a control flow graph is built. 

• For each graph, we check whether there is a unique label that is reachable from the root by a unique 
path. This unique path corresponds to the instructions generated by the calling conventions as discussed 
in section 14.21 We shift the occurrence of the label to the root of the graph. 

• By a strongly connected components algorithm, we check whether every loop in the graphs goes through 
at least one label. 

• We perform a (depth-first) search of the graph. Whenever we reach a labelled node, we perform a 
second (depth-first) search that stops at labelled nodes and computes an upper bound on the cost of the 
occurrence of the label. Of course, when crossing a branching instruction, we take the maximum cost 
of the branches. When the second search stops we update the current cost of the label-cost mapping 
(by taking a maximum) and we continue the first search. 

• Warning messages are emitted whenever the maximum is taken between two different values as in this 
case the precision condition may be violated. 

4.4 Building the cost annotation 

Once the label-cost mapping is computed, instrumenting the labelled source code is an easy 
task. A fresh global variable which we call cost variable is added to the source program with 
the purpose of holding the cost value and it is initialised at the very beginning of the main 
program. Then, every label is replaced by an increment of the cost variable according to the 
label-cost mapping. Following this replacement, the cost labels disappear and the result is a 
Clight program with annotations in the form of assignments. 

There is one final problem: labels inside expressions. As we already mentioned, Clight does 
not allow writing side-effect instructions — such as cost increments — inside expressions. To 
cope with this restriction, we produce first an instrumented C program — with side-effects in 
expressions — that we translate back to Clight using CIL. This process is summarized below. 



4.5 Testing 

It is desirable to test the coherence of the labelling from Clight to Mips. To this end, each 
labelled language comes with an interpreter that produces the trace of the labels encountered 
during the computation. Then, one naive approach is to test the equality of the traces pro- 
duced by the program at the different stages of the compilation. Our current implementation 
passes this kind of tests. For some optimisations that may re-order computations, the weaker 
condition mentioned in remark [8] could be considered. 

5 Conclusion and future work 

We have discussed the problem of building a compiler which can lift in a provably correct 
way pieces of information on the execution cost of the object code to cost annotations on the 
source code. To this end, we have introduced the so called labelling approach and discussed 
its formal application to a toy compiler. Based on this experience, we have argued that the 
approach has good scalability properties, and to substantiate this claim, we have reported 



Labelled Clight 
label-cost mapping 




Instrumented C 



CIL 



Instrumented Clight 
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on our successful experience in implementing and testing the labelling approach on top of 
a prototype compiler written in oca ml for a large fragment of the C language which can be 
shortly described as Clight without floating point. 

We discuss next a few directions for future work. First, we are currently testing the cur- 
rent compiler on the kind of C code produced for embedded applications by a Lustre compiler. 
Starting from the annotated C code, we are relying on the Frama — C tool to produce auto- 
matically meaningful information on, say, the reaction time of a given synchronous program. 
Second, we are porting the current compiler to other assembly languages. In particular, we 
are interested in targeting one of the assembly languages covered by the Abslnt tool so as to 
obtain more realistic estimations of the execution cost of sequences of instructions. Third, 
we plan to formalise and validate in the Calculus of Inductive Constructions the prototype 
implementation of the labelling approach for the C compiler described in section |Bj This 
requires a major implementation effort which will be carried on in collaboration with our 
partners of the CerCo project [3]. 
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A Proofs 

We omit the proofs that have been checked by K. Memarian with the Coq proof assistant |12| . 



A.l Notation 

Let A be a family of reduction relations where t ranges over the set of labels and e. Then we 
define: 

A= | (4)* ^ ift = e 

(A)*o -4 o(A)* otherwise 

where as usual R* denote the reflexive and transitive closure of the relation R and o denotes 
the composition of relations. 

A. 2 Proof of proposition [4] 

Given a Vm code C, we define an 'accessibility relation' ^> as the least binary relation on 
{(),..., |C| - 1} such that: 

C[i] = branch(fc) (i + k + 1) j 



■ c . . c . 

i ~» i i j 

We also introduce a ternary relation R(C, i, K) which relates a Vm code C, a number 
i £ {0, . . . , \C\ — 1} and a continuation K. The relation is defined as the least one that 
satisfies the following conditions. 

i Z> i' C = Ci ■ C(S) ■ C 2 
iZj C\j] =halt i' = |Ci| i=|Ci-C(S)| R(C,j,K) ■ 



R(C,i, halt) R(C,i,S-K) 
The following properties are useful. 

Lemma 22 (1) T/ie relation ~» is transitive. 
(2) I/i-S-j andR(C,j,K) then R(C,i, K). 

The first property can be proven by induction on the definition of ^ and the second by 
induction on the structure of K. 

Next we can focus on the proposition. The notation C ■ C means that i = \C\. Suppose 
that: 

{S,K,s) -> (S',K',s') (1) and R(C,i,S-K) (2). 
From (2), we know that there exist i! and i" such that: 

i&i' (3), C = d *' C(5) C 2 (4), and R(C,i",K) (5) 
and from (3) it follows that: 

C I" («,er,s) A (i',cr,s) (3') . 

We are looking for j such that: 

C\-(i,*,s) A(i, V ') (6), and R(C,j,S'-K') (7). 
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We proceed by case analysis on S. We just detail the case of the conditional command as the 
the remaining cases have similar proofs. If S = if e\ < e% then S\ else S% then (4) is rewritten 
as follows: 

C = Ci *' C(ei) ■ C(e 2 ).bge(fc 1 ) • C(Si) • branch(fc 2 ) • C(S 2 ) C 2 

where c = a + fci and i" = c + k<i . We distinguish two cases according to the evaluation of 
the boolean condition. We describe the case {e\ < e<2) JJ- true. We set j = a. 

• The instance of (1) is (S,K,s) — > (Si,K,s). 

• The reduction required in (6) takes the form C h (i, a, s) A {if , a, s) A (a, a, s'), and it 
follows from (3'), the fact that (ei < e-i) \ true, and proposition [3^2) . 

• Property (7), follows from lemma[22lf2). fact (5), and the following proof tree: 

.jo. _b^i" R(C,i",K) 



R{C,b,K) 



R(C,j,Si-K) 

□ 



A. 3 Proof of proposition [5] 

We actually prove that for any expression e, statement S, and program P the following holds: 

(1) For any n € N there is a unique h such that C(e) : h, h(0) = n, and /i(|C(e)|) = h(0) + 1. 

(2) For any S, there is a unique h such that C(S) : h, h(0) = 0, and h(\C(e)\) = 0. 

(3) There is a unique h such that C{P) : h. 

A. 4 Proof of proposition [7] 

(1) By induction on the structure of the command S. 

(2) By iterating the following proposition. 

Proposition 23 If (S,K,s) A (S',K',s') and R(C,i,S ■ K) with t = I or t = e then 
C h (i, a, s) A (j, c, s') and R(C, j, S' ■ K') . 

This is an extension of proposition [J] and it is proven in the same way with an additional 
case for labelled commands. □ 

A. 5 Proof of proposition [9] 

(1) The compilation of the Vm instruction nop(^) is the Mips instruction (nop £). 

(2) By iterating the following proposition. 

Proposition 24 Let C : h be a well formed code. If C h (i,a,s) A (j,a',s') with t = £ or 
t = e, h(i) = \a\ and m \\—a, s then C'(C) h (p(i, C),m) 4> {p(j, C), m') and m' \\— a' , s' . 

This is an extension of proposition [6] and it is proven in the same way with an additional 
case for the nop instruction. □ 
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A. 6 Proof of proposition 1101 

We extend the instrumentation to the continuations by defining: 

X{S ■ K)=1{S)-X{K) J(halt) = halt . 
Then we examine the possible reductions of a configuration (I(S) ,I(K) , s[c/ cost]) . 

• If S is an unlabelled statement such as while b do S' then I(S) = while b do T(S') and 
assuming (6, s) JJ. true the reduction step is: 

(l(S),l(K),s[c/cost]) -> (1(5'), X(5) • T(K), s[c/ cost}) . 

Noticing that I(S) ■ T{K) = I(S • K), this step is matched in the labelled language as 
follows: 

{S, K, s[c/cost}) -> (5', 5 • K, s[c/cost}) . 

• On the other hand, if S = t : S' is a labelled statement then I(S) = inc(£);I(S') and, 
by a sequence of reductions steps, we have: 

(l(S),l(K),s[c/cost}) A (l(S'),l(K),s[c + K(l)/cost}) . 

This step is matched by the labelled reduction: 

(S,K,s[c/cost]) A {S',K,s[c/cost}) . 

□ 

A. 7 Proof of proposition 1121 

By diagram chasing using propositions 17(1), [9(1), and the definition [TT] of labelling. □ 

A. 8 Proof of proposition 1131 

Suppose that: 

(1(C(P)), s[c/ cost]) ij- s'[c + 5/ cost] and m \\— s[c/ cost] . 
Then, by proposition I10( for some A: 

(C(P),s[c/cost]) J| (s'[c/cost],X) and k(\) = 5 . 

Finally, by propositions [7(2) and [9(2) : 

(C'(C(C(P))),m) ^ (m',A) and m' ||- s'[c/cost] . 

□ 

A. 9 Proof of proposition 1161 

If A = l\ ■ ■ ■ £ n then the computation is the concatenation of simple paths labelled with 
£i, . . . ,£ n . Since k{£{) bounds the cost of a simple path labelled with £i, the cost of the overall 
computation is bounded by k(X) = k(£\) + • • • n{£ n )- n 
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A. 10 Proof of proposition 1181 

Same proof as proposition [TCI by replacing the word bounds by is exactly and the words 
bounded by by exactly. □ 

A. 11 Proof of proposition UM 

In both labellings under consideration the root node is labelled. An obvious observation is 
that only commands of the shape while b do S introduce loops in the compiled code. We 
notice that both labelling introduce a label in the loop (though at different places). Thus all 
loops go through a label and the compiled code is always sound. 

To show the precision of the second labelling C p , we note the following property. 

Lemma 25 A soundly labelled graph is precise if each label occurs at most once in the graph 
and if the immediate successors of the bge nodes are either halt (no successor) or labelled 
nodes. 

Indeed, in a such a graph starting from a labelled node we can follow a unique path up 
to a leaf, another labelled node, or a bge node. In the last case, the hypotheses in the lemma 
1251 guarantee that the two simple paths one can follow from the bge node have the same 
length/cost. □ 

A. 12 Proof of proposition [211 

By applying consecutively proposition [13] and propositions [H] or [HI n 
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B AC compiler 



This section gives an informal overview of the compiler, in particular it highlights the main 
features of the intermediate languages, the purpose of the compilation steps, and the optimi- 
sations. 

B.l Clight 

Clight is a large subset of the C language that we adopt as the source language of our compiler. 
It features most of the types and operators of C. It includes pointer arithmetic, pointers to 
functions, and struct and union types, as well as all C control structures. The main difference 
with the C language is that Clight expressions are side-effect free, which means that side-effect 
operators (=,+=,++,. . .) and function calls within expressions are not supported. Given a C 
program, we rely on the CIL tool [13j to deal with the idiosyncrasy of C concrete syntax 
and to produce an equivalent program in Clight abstract syntax. We refer to the CompCert 
project [9] for a formal definition of the Clight language. Here we just recall in figure IBTT1 
its syntax which is classically structured in expressions, statements, functions, and whole 
programs. In order to limit the implementation effort, our current compiler for Clight does 
not cover the operators relating to the floating point type float. So, in a nutshell, the 
fragment of C we have implemented is Clight without floating point. 

B.2 Cminor 

Cminor is a simple, low-level imperative language, comparable to a stripped-down, typeless 
variant of C. Again we refer to the CompCert project for its formal definition and we just 
recall in figure IB.2I its syntax which as for Clight is structured in expressions, statements, 
functions, and whole programs. 

Translation of Clight to Cminor As in Cminor stack operations are made explicit, one has 
to know which variables are stored in the stack. This information is produced by a static 
analysis that determines the variables whose address may be 'taken'. Also space is reserved 
for local arrays and structures. In a second step, the proper compilation is performed: it 
consists mainly in translating Clight control structures to the basic ones available in Cminor. 

B.3 RTLAbs 

RTLAbs is the last architecture independent language in the compilation process. It is a 
rather straightforward abstraction of the architecture- dependent RTL intermediate language 
available in the CompCert project and it is intended to factorize some work common to the 
various target assembly languages (e.g. optimizations) and thus to make retargeting of the 
compiler a simpler matter. 

We stress that in RTLAbs the structure of Cminor expressions is lost and that this may have 
a negative impact on the following instruction selection step. Still, the subtleties of instruction 
selection seem rather orthogonal to our goals and we deem the possibility of retargeting easily 
the compiler more important than the efficiency of the generated code. 
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Expressions: 



Statements: 



id 
n 

sizeof(r) 

| opi a 
| a op2 a 
| *a 
| a.id 
| &a 
I (r)a 
| a?a : a 

skip 

| a = a 
| a = a(a*) 

I s ; s 

| if a then s else s 
| switch a sw 
while a do s 
| do s while a 
] for(s,a,s) s 
| break 
| continue 
| return a 7 
| goto Ibl 
\lbl: s 



variable identifier 
integer constant 
size of a type 

unary arithmetic operation 
binary arithmetic operation 
pointer dereferencing 
field access 
taking the address of 
type cast 

conditional expression 

empty statement 
assignment 
function call 
procedure call 
sequence 
conditional 
multi-way branch 
"while" loop 
"do" loop 
"for" loop 

exit from current loop 

next iteration of the current loop 

return from current function 

branching 

labelled statement 



Switch cases: 



Programs: 



P ::= 



default : s 

I case n : s: sw 



Variable declarations: del ::= (r id)* 
Functions: Fd :: 



t id(dcl){dcl; s} 
| extern r id(dcl) 



default case 
labelled case 

type and name 

internal function 
external function 



del; Fd*; main = id global variables, functions, entry point 
Figure 1: Syntax of the Clight language 
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Signatures: sig ::= sig int (int|void) 

Expressions: a ::— id 

n 

addr symbol (id) 
addrstack(<5) 
opi a 
op2 a a 
k[o] 

n.7a • a. 



Statements: 



Switch tables: tbl 



Programs: P ::= 



skip 

id — a 

k [a] = a 

id ! = a(a) : sig 

tailcall a(a) : sig 

return(a' ) 

s; s 

if a then s else s 

loop s 

block s 

exit n 

switch a tbl 

Ibl : s 

goto Ibl 

default : exit(n) 
case i: exit(?i);tfe/ 



arguments and result 

local variable 
integer constant 
address of global symbol 
address within stack data 
unary arithmetic operation 
binary arithmetic operation 
memory read 
conditional expression 

empty statement 
assignment 
memory write 
function call 
function tail call 
function return 
sequence 
conditional 
infinite loop 

block delimiting exit constructs 
terminate the (n + l)" 1 enclosing block 
multi-way test and exit 
labelled statement 
jump to a label 



internal function: signature, parameters, 
local variables, stack size and body 
external function 



Functions: Fd ::= internal sig id id n s 

external id sig 

prog (id = data)* (id = Fd)* id global variables, functions and entry point 
Figure 2: Syntax of the Cminor language 
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return -type 



int I void 



signature 



(int — >)* return-type 



mernq 
instruction 



:= int8s I int8u II intl6s II intl6u II int32 



funjref 



| skip — > node 

| psdjreg := op(psdjreg*) — > node 

| psdjreg := Ikvar jname — > node 

| psdjreg := &locals[n] —5- node 

| psdjreg := fun-name — > node 

| psdjreg := memg(psd_re(/[p,sd_re(/]) — >■ node 

| memg(psd_rei;[psd_re(;]) := psdjreg — > node 

| psdjreg := funjref (psdjreg*) : signature —¥ node 

\ fun jref (psdjreg*) : signature 

| test op(psdjreg*) — ^ node, node 

| return psdjreg? 



funjname \ psdjreg 

(no instruction) 
(operation) 
(address of a global) 
(address of a local) 
(address of a function) 
(memory load) 
(memory store) 
(function call) 
(function tail call) 
(branch) 
(return) 



init -datum 



fun_def ::= funjname(psdjreg*) : signature 
result -.psdjreg? 
locals -.psdjreg* 
stack :n 
entry -.node 
exit -.node 

(node -.instruction)* 
reserve(n) || int8(n) || int!6(n) || int32(n) init_data 



init_datum 



global_decl ::= var varjname{init_data} fun-decl ::= extern funjname(signature) \ fun_def 



program 



global _decl* 
fun_decl* 



Table 9: Syntax of the RTLAbs language 

Syntax. In RTLAbs, programs are represented as control flow graphs (CFGs for short). 
We associate with the nodes of the graphs instructions reflecting the Cminor commands. 
As usual, commands that change the control flow of the program (e.g. loops, conditionals) 
are translated by inserting suitable branching instructions in the CFG. The syntax of the 
language is depicted in table [9l Local variables are now represented by pseudo registers that 
are available in unbounded number. The grammar rule op that is not detailed in table [9] 
defines usual arithmetic and boolean operations (+, xor, <, etc.) as well as constants and 
conversions between sized integers. 



Translation of Cminor to RTLAbs. Translating Cminor programs to RTLAbs programs 
mainly consists in transforming Cminor commands in CFGs. Most commands are sequential 
and have a rather straightforward linear translation. A conditional is translated in a branch 
instruction; a loop is translated using a back edge in the CFG. 
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Byte I HalfWord || Word 



funjref 



fun_name || psdjreg 



instruction 



skip — > node 
psdjreg 
psdjreg 
psdjreg 
psdjreg 
psdjreg 
psdjreg 
psdjreg 



— n — > node 

— unop(psdjreg) — s> node 
= binop(psdjreg, psdjreg) 

— &globals[n] — >• node 

— &locals[n] — s> node 
= funjname — ¥ node 



node 



size(psdjreg[n]) — 
size(psdjreg[n]) := psdjreg — 
psdjreg := fun jref (psdjreg*) 
funjref (psdjreg* ) 
test uncon(psdjreg) — > node, 
test bincon(psdjreg, psdjreg) 
return psdjreg? 

fun_def ::= funjname(psd_reg*) 



> node 

> node 
— > node 

node 

— > node, node 



program 



result 



jreg < 



locals -.psdjreg* 
stack :n 
entry -.node 
exit -.node 

(node -.instruction)* 



(no instruction) 
(constant) 
(unary operation) 
(binary operation) 
(address of a global) 
(address of a local) 
(address of a function) 
(memory load) 
(memory store) 
(function call) 
(function tail call) 
(branch unary condition) 
(branch binary condition) 
(return) 

= globals : n 

fun_def* 



Table 10: Syntax of the RTL language 

B.4 RTL 

As in RTLAbs, the structure of RTL programs is based on CFGs. RTL is the first architecture- 
dependant intermediate language of our compiler which, in its current version, targets the 
Mips assembly language. 

Syntax. RTL is very close to RTLAbs. It is based on CFGs and explicits the Mips instruc- 
tions corresponding to the RTLAbs instructions. Type information disappears: everything is 
represented using 32 bits integers. Moreover, each global of the program is associated to an 
offset. The syntax of the language can be found in table [TUl The grammar rules unop, binop, 
uncon, and bincon, respectively, represent the sets of unary operations, binary operations, 
unary conditions and binary conditions of the Mips language. 

Translation of RTLAbs to RTL. This translation is mostly straightforward. A RTLAbs 
instruction is often directly translated to a corresponding Mips instruction. There are a few 
exceptions: some RTLAbs instructions are expanded in two or more Mips instructions. When 
the translation of a RTLAbs instruction requires more than a few simple Mips instruction, it 
is translated into a call to a function defined in the preamble of the compilation result. 

B.5 ERTL 

As in RTL, the structure of ERTL programs is based on CFGs. ERTL explicits the calling 
conventions of the Mips assembly language. 
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Byte I HalfWord || Word 



funjref 



funjaame || psdjreg 



instruction 



skip — > node 
New Frame — > node 
Del Frame — ^ node 
psdjreg := stack[slot , n] 
stack[slot, n] — psdjreg 



node 
node 



hdwjreg := psdjreg — > node 

psdjreg := hdwjreg — > node 

psdjreg := n — > node 

psdjreg := unop(psdjreg) — > node 

psdjreg := bmop(psdjreg, psdjreg) — > node 

psdjreg := funjname — >■ node 

psdjreg := size(psdjreg[n]) — >■ node 

size(psdjreg[n\) := psdjreg — > node 

funjref (n) —¥ node 

funjref (n) 

test uncon(psdjreg) —> node, node 

test bmcon{psdjreg, psdjreg) — > node, node 

return b 



fun_def ::= 



funjname(n) 
locals -.psdjreg* 
stack :n 
entry -.node 

■.instruction) 



program 



(no instruction) 
(frame creation) 
(frame deletion) 
(stack load) 
(stack store) 
(pseudo to hardware) 
(hardware to pseudo) 
(constant) 
(unary operation) 
(binary operation) 
(address of a function) 
(memory load) 
(memory store) 
(function call) 
(function tail call) 
(branch unary condition) 
(branch binary condition) 
(return) 

:= globals : n 

fun_def* 



Table 11: Syntax of the ERTL language 



Syntax. The syntax of the language is given in table QTJ The main difference between 
RTL and ERTL is the use of hardware registers. Parameters are passed in specific hardware 
registers; if there are too many parameters, the remaining are stored in the stack. Other con- 
ventionally specific hardware registers are used: a register that holds the result of a function, 
a register that holds the base address of the globals, a register that holds the address of the 
top of the stack, and some registers that need to be saved when entering a function and whose 
values are restored when leaving a function. Following these conventions, function calls do not 
list their parameters anymore; they only mention their number. Two new instructions appear 
to allocate and deallocate on the stack some space needed by a function to execute. Along 
with these two instructions come two instructions to fetch or assign a value in the parameter 
sections of the stack; these instructions cannot yet be translated using regular load and store 
instructions because we do not know the final size of the stack area of each function. At last, 
the return instruction has a boolean argument that tells whether the result of the function 
may later be used or not (this is exploited for optimizations). 

Translation of RTL to ERTL. The work consists in expliciting the conventions previously 
mentioned. These conventions appear when entering, calling and leaving a function, and 
when referencing a global variable or the address of a local variable. 

Optimizations. A liveness analysis is performed on ERTL to replace unused instructions 
by a skip. An instruction is tagged as unused when it performs an assignment on a register 
that will not be read afterwards. Also, the result of the liveness analysis is exploited by 
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size 



Byte I HalfWord | Word 



funjref 



= fun. name || hdwjreg 



instruction 



skip — > node 

NewFrame — > node 

DelFrame — > node 

hdwjreg := n — s> node 

hdwjreg := unop (hdwjreg) —¥ node 

hdwjreg := binop(hdw jreg , hdwjreg) — > node 

hdwjreg := funjname —> node 

hdwjreg := si,ze(/ic£iu_ra7[n]) — > node 

size(/idui_re(/[n]) := hdwjreg — > node 

funjref () — s> node 

funjref() 

test uncon (hdwjreg) — } node, node 

test bincon(hdw jreg , hdwjreg) — > node, node 

return 



(no instruction) 
(frame creation) 
(frame deletion) 
(constant) 
(unary operation) 
(binary operation) 
(address of a function) 
(memory load) 
(memory store) 
(function call) 
(function tail call) 
(branch unary condition) 
(branch binary condition) 
(return) 



/«n_de/ 



funjname(n) 

locals :n 

stack :n 

entry :node 

(node : instruction)* 



program 



globals : n 

fun_def* 



Table 12: Syntax of the LTL language 



a register allocation algorithm whose result is to efficiently associate a physical location (a 
hardware register or an address in the stack) to each pseudo register of the program. 



As in ERTL, the structure of LTL programs is based on CFGs. Pseudo registers are not used 
anymore; instead, they are replaced by physical locations (a hardware register or an address 
in the stack). 

Syntax. Except for a few exceptions, the instructions of the language are those of ERTL 
with hardware registers replacing pseudo registers. Calling and returning conventions were 
explicited in ERTL; thus, function calls and returns do not need parameters in LTL. The 
syntax is defined in table [12j 

Translation of ERTL to LTL. The translation relies on the results of the liveness analysis 
and of the register allocation. Unused instructions are eliminated and each pseudo register is 
replaced by a physical location. In LTL, the size of the stack frame of a function is known; 
instructions intended to load or store values in the stack are translated using regular load and 
store instructions. 

Optimizations. A graph compression algorithm removes empty instructions generated by 
previous compilation passes and by the liveness analysis. 



B.6 LTL 
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size 



Byte I HalfWord | Word 



funjref ::= fun. name || hdwjreg 



instruction 



| NewFrame 
| DelFrame 

| hdwjreg := n 

| hdwjreg := unop (hdwjreg) 

| hdwjreg := binop(hdwjreg, hdwjreg) 

| hdwjreg := funjname 

| hdwjreg := size(hdw_reg[n}) 

| size(hdw_reg[n]) := hdwjreg 

| call fun_ref 

I ta ilea 1 1 funjref 

| uncon (hdwjreg) — > node 

| bincon(hdw jreg , hdwjreg) — > node 

| mipsjabel : 

| goto mipsjabel 

II return 



(frame creation) 
(frame deletion) 
(constant) 
(unary operation) 
(binary operation) 
(address of a function) 
(memory load) 
(memory store) 
(function call) 
(function tail call) 
(branch unary condition) 
(branch binary condition) 
(Mips label) 
(goto) 
(return) 



fun_def 



funjname(n) 
locals :n 

instruction* 



program 



globals : n 

fun_def* 



Table 13: Syntax of the LIN language 



B.7 LIN 



In LIN, the structure of a program is no longer based on CFGs. Every function is represented 
as a sequence of instructions. 

Syntax. The instructions of LIN are very close to those of LTL. Program labels, gotos and 
branch instructions handle the changes in the control flow. The syntax of LIN programs is 
shown in table [T3l 

Translation of LTL to LIN. This translation amounts to transform in an efficient way the 
graph structure of functions into a linear structure of sequential instructions. 

B.8 Mips 

Mips is a rather simple assembly language. As for other assembly languages, a program in Mips 
is a sequence of instructions. The Mips code produced by the compilation of a Clight program 
starts with a preamble in which some useful and non-primitive functions are predefined (e.g. 
conversion from 8 bits unsigned integers to 32 bits integers). The subset of the Mips assembly 
language that the compilation produces is defined in table [TH 

Translation of LIN to Mips. This final translation is simple enough. Stack allocation and 
deallocation are explicited and the function definitions are sequentialized. 

B.9 Benchmarks 

To ensure that our prototype compiler is realistic, we performed some preliminary benchmarks 
on a 183MHz MIPS 4KEc processor, running a linux based distribution. We compared the 
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load 



lb I Ihw I Iw 



store 



sb I shw I sw 



funjref 



funjname | hdwjreg 



instruction ::— 



nop 

li hdwjreg, n 

unop hdwjreg, hdwjreg 

binop hdwjreg, hdwjreg, hdwjreg 

la hdwjreg, fun jname 

load hdw jreg , n(hdw jreg) 

store hdw jreg, n(hdw jreg) 

call funjref 

uncon hdwjreg, node 

bincon hdwjreg, hdwjreg, node 

mipsJabel : 

j mipsJabel 

return 



program 



(empty instruction) 

(constant) 

(unary operation) 

(binary operation) 

(address of a function) 

(memory load) 

(memory store) 

(function call) 

(branch unary condition) 

(branch binary condition) 

(Mips label) 

(goto) 

(return) 



globals : n 

entry : mipsJabel* 

instruction* 



Table 14: Syntax of the Mips language 





gcc -00 


acc 


gcc -01 


badsort 


55.93 


34.51 


12.96 


fib 


76.24 


34.28 


45.68 


mat_det 


163.42 


156.20 


54.76 


min 


12.21 


16.25 


3.95 


quicksort 


27.46 


17.95 


9.41 


search 


463.19 


623.79 


155.38 



Figure 3: Benchmarks results (execution time is given in seconds). 
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wall clock execution time of several simple C programs compiled with our compiler against 
the ones produced by Gcc set up with optimization levels and 1. As shown by Figure O 
our prototype compiler produces executable programs that are on average faster than Gee's 
without optimizations. 
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